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Foreword 

The National Household Education Survey 
(NHES) represents a major new initiative of the 
National Center for Education Statistics (NCES). 
Between February and May of 1991, the NHES was 
fielded for the first time as a mechanism for collecting 
data on two different sectors of education policy 
interest: the early childhood education experience of 
young children and participation in adult education. 
Because the NHES methodology is relatively new and 
relies on some innovative approaches, a field test of 
the methodology was an essential fu-st step in the 
development of the survey. Many of the methods of 
evaluated durmg the 1989 NHES field test were 
adopted for the full-scale survey. 

A large field test of approximately 15,000 
households was conducted during the fail of 1989. A 
number of methodological issues associated with 
collecting and analyzing data on education issues from 
a random digit dialing telephone survey were 
examined. This report is one of five that describe th<; 
1989 NHES Field Test experience. The five reports 
are the first in a series of technical publications 
pertaining to the design and conduct of the NHES 
that NCES hopes to continue in the years to come. 
NCES believes that the reports contained in this series 
will provide users of the NHES data with a better 
understanding of the NHES methodology and that 
they will assist the survey design efforts of others. 

The first report in this series, Overview of the 
National Household Education Survey Field Test, 
describes the design of the field test and the outcomes 
of the field test data collection activities. It reports on 
the response rates obtained, both unit and item, and 
the burdv^n associated with survey participation. Each 
of the next four reports in the series focuses on a 
specific issue that was examined in the 1989 NHES 
field test. 



The second report, Telephone Undercoverage 
Bias of 14- to 21'Year-0lds and 3- to 5-Year-Olds, 
analyzes data from the Current Population Survey to 
identify the extent of telephone coverage for two 
distinct populations of interest and the bias associated 
with this type of undercoverage for estimates of school 
dropouts and early childhood education program 
participation. Methods for adjusting survey estimates 
to partially reduce this bias are developed and 
evaluated. 

The third report. Multiplicity Sampling for 
Dropouts in the NHES Field Test, examines a 
technique that was used to increase the coverage of 
14- to 21-year-olds and to capture more dropouts in 
the sample. The report describes the effectiveness of 
the multiplicity sample in achieving these goals. 

The fourth report. Proxy Reporting of Dropout 
Status in the A HES Field Test, focuses on 
measurement errors arising from the use of proxy 
respondents. During the 1989 Field Test, a 
knowledgeable household member was used as a 
source of information on the school enrollment of 
each sampled 14- to 21-year-old in the household. In 
addition, 14- to 21-year-olds were asked to report on 
their own school enrollment. The report describes the 
correspondence between the responses given by proxy 
respondents with those provided by the youths 
themselves. 

The fifth report. Effectiveness of Oversampling 
Blacks and Hispanics in the NHES Field Test, 
describes the approach used to increase the number 
of black and Hispanic households/youth in the 
sample. During the field test, an approach that uses 
demographic information at the telephone exchange 
level to develop sampling strata was used to 
oversample black and Hispanic households. The 
report examines the yield of the field test sample 
design versus that which would have been expected 
without oversampling. The effects of oversampling on 
the precision of survey estimates are reported. 
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Introduction 

During the fall of 1989, the Field Test of the 
National Household Education Survey (NHES) was 
conducted by the National Center for Education 
Statistics (NCES) to explore the feasibility of 
collecting education data by telephone from a sample 
of persons in their households. The NHES is the 
first major attempt by NCES to go beyond its 
traditional surveys, which rely upon school-based data 
collection systems and are typically conducted by mail 
or in-person data collection methods. 

A household survey has the potential to provide the 
types of data needed to study current issues in 
education, particularly those which are difficult to 
adequately address through a school-based survey. 
Such issues include dropping out of school, adult and 
continuing education, preschool education, the status 
of former teachers, and home-based education. 
Consequently, the NHES methodology may greatly 
enhance the scope of issues covered by the data 
collection activities of NCES. 

Since the NHE? data collection methods were 
untested for education surveys, the Field Test was 
developed to evaluate the use of this approach. Two 
topics of broad policy interest were included in the 
Field Test: the early childhood education 
characteristics of 3- to 5-year-olds, and the 
educational status of 14- to 21 -year-olds with a 
special focus on youth who dropped out of school 
before completing high school. By including both of 
these study areas in the Field Test, the ability to use 
the NHES to study multiple, complex topics, 
employing different sampling requirements and 
respondent rules could be evaluated. 

Westat, Inc., under contract with NCES, conducted 
all of the Field Test interviews using computer- 
assisted telephone interviewing (CATI) methods. 
The use of CATI methods made sampling 
respondents for interviews easy and nearly invisible to 
the telephone respondent, an important benefit when 
several persons may be sampled in a household. 
CATI also directed the interviewers through complex 
skip patterns and provided the opportunity to 
incorporate edit checks to help resolve inconsistencies 
in the data while the respondents were still on the 
telephone. Another major advantage of the use of 
CATI was that data analysis could begin soon after 
data collection ended, because data entry and many 
of the edit checks were done during the interview. 



The sampling scheme used in the Field Test was a 
variant of the Mitofsky-Waksberg random digit dial 
(RDD) procedure^ in which every residential 
telephone number has the same chance of being 
drawn into^ the sample. Because of the need for 
more precise estimates of blacks and Hispanics, 
special sampling methods were used to increase the 
sample size for these persons. The design for the 
Field Test was essentially the same as planned for a 
full-scale NHES study, except the overall sample size 
was smaller. 

The sample resulted in collecting data from 15,037 
households representing all civilian, noninstitution- 
alized persons in the 50 states and the District of 
Columbia. Although only persons living in telephone 
households could be sampled for the Field Test, 
adjustments were made in the weights so that the 
estimates of persons living in both telephone and 
nontelcphone households could be produced. 

Respondents in sampled households were asked a 
series of screening questions. This interview, called 
the Screener, was used to enumerate all the members 
of the household, determine the eligibility of each 
person in the household for the early childhood 
education (3- to 5-year-olds) and youth (14- to 21- 
year-olds) studies, and obtain some data on the 
characteristics of the household. A total of 4,374 
households had at least one person enumerated in 
the Screener who was eligible for an extended 
interview. The response rate to the Screener was 79 
percent. 

The early childhood education interview was 
conducted with the parent or guardian who was 
identified as knowing the most about each sampled 
3- to ^ 5-year-old child's care and education. 
Accordingly, this interview was called the Parent 
Interview. Of the 1,551 children identified in the 
Screener, parents completed interviews for 1,530 
children, a completion rate of 99 percent. 

If the household contained any 14- to 21-year-olds, 
then a Household Respondent Interview (HRI) was 
attempted for each of these members. The HRI was 
used to determine the current and previous 
educational status of the youth; this interview could 
be completed by any adult household member who 
knew about the educational activities of the youth, 
including self-reports by the youth. Of the 4,441 
youths identified in the Screener, HRIs were 
completed for 4,313 youths, for a 97 percent comple- 
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tion rate. As part of a special methodological study 
of multiplicity sampling, mothers in a subsample of 
the households were asked to complete the HRI for 
their 14- to 21-year-old children who did not live in 
their household. These youth are included in the 
numbers stated above. 

A Youth Interview (YI) was then attempted for a 
subsample of the 14- to 21-year-olds in the 
household. All the youth who were not currently 
enrolled in school and did not have a high school 
diploma or equivalent (as reported in the HRI), and 
a sample of all other youth, were targeted for the YI. 
The interview contained more detailed items on the 
educational experiences of the youth that could only 
be answered by the youth. Of the 1,863 youths 
sampled, 1,604 completed the YI, a completion rate 
of 86 percent. These numbers include a sample of 
133 youths who did not live in the sampled 
households, but were included through the 
multiplicity sample when their mothers completed the 
HRI. 

This report describes the usefulness of multiplicity 
sampling procedures in the NHES Field Test, one of 
several methodological studies undertaken in the 
Field Test. The Field Test is described in greater 
detail in another report entitled Oven>ie\v Report on 
the 1989 National Household Education Sun'ey Field 
Test" the first in a series of reports on the Field Test. 
The Overview Report describes the sample design, 
the data collection methods and instruments, the 
response rates, and other salient aspects of the 
collection and analysis process for the Field Test. 

Multiplicity sampling^ was used in the Field Test in 
an attempt to increase the number of dropouts 
included in the sample and to reduce the bias 
associated with telephone coverage. A sample of 25 
percent of the households was randomly selected to 
examine the use of multiplicity sampling in a 
telephone survey. All women between the ages of 28 
and 65 in the randomly subsampled households were 
asked about their children who did not currently live 
with the mother. The mothers were asked to 
complete an interview for each of these "out-of- 
household" youths, and the youths themselves were 
eligible for sampling to complete an extended 
intervieVv'. 

The next section describes some of the salient aspects 
of the sample design of the Field Test, especially 
those most related td the multiplicity sampling 
procedure. The subsequent sections describe various 



aspects of the effectiveness of the multiplicity 
sampling procedures, especially with respect to the 
implementation of the procedures in a full-scale 
survey. The last section summarizes the findings and 
makes recommendations for the application of 
multiplicity sampling in a full-scale survey on 
dropouts. 

Data Source and Estimation 
Methods 

As noted above, die NHES Field Test consisted of a 
series of related interviews. Figure 1 diagrams the 
flow of these interviews (the Screener, the Household 
Respondent Interview, and the Youth Interview) for 
a sampled household. 

A random sample of 25 percent of all households was 
selected to participate in the multiplicity sample 
experiment. In these households, all females aged 28 
to 65 years were asked to enumerate and complete 
an HRI for each of their 14- to 21-year-old children 
who did not currently live in their household. Youths 
who were living away from home in student housing 
were classified as in-household members. All other 
eligible (i.e., civilian and noninstitutionalized) 14- to 
21-year-olds identified in this process were 
considered "out-of-household" members. 

The Field Test was designed to investigate the 
efficiency of multiplicity sampling in accomplishing 
two goals. One of these goals was to increase the 
sample size for 14- to 21-year-olds, especially for 
dropouts. The other goal was to improve the 
coverage of the 14- to 21-year-old population by 
including youths who live in a household without a 
telephone but have a mother living in a telephone 
household. Of course, there are still youths who are 
not covered in a telephone survey even with the 
multiplicity sample. Youths who live with their 
mothers in nontelephone households and youths who 
live in nontelephone households and do not have 
mothers who live in telephone households are not 
covered. 

One way of representing the population of 14- to 21- 
year-olds appears as figure 2, which shows the 
domains for which estimates are desired. Domains 
A and B are not affected by the multiplicity sample 
because the youths can only be sampled through one 
telephone number (the telephone number of the 
household in which they live). Since the NHES is a 
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Figure 1. - Flow of NHES interviews for 14- to 21 -year-olds 
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Figure 2.- Domains of 14- to 21 -year-olds for the NHES Field Test 
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telephone sample, domains D and E are excluded by 
design even with multiplicity sampling. 

The multiplicity sample does have an impact on 
estimates for domains C and F. The use of the 
multiplicity sample makes it possible to cover and 
produce estimates for youths from domain F. It also 
makes it possible to produce two estimates for 
domain C, since a youth in this domain could be 
sampled in two ways. First, the youth^s household 
could be sampled and an HRI would be attempted 
for each of the youths in the household. These 
youths would then be eligible for the Youth 
Interview. Second, the mother^s household could be 
sampled and included in the multiplicity sample. The 
mother should enumerate the youth as being out-of~ 
household and complete the HRI for that youth. 
These youths would also be eligible for a Youth 
Interview. 

Procedures were developed to incorporate the out-of- 
household sample in the estimation process. One 



method, called a dual frame approach,^ makes use of 
the fact that a particular domain of persons (youths 
who live in telephone households with mothers who 
live in different telephone households) can be 
estimated in two ways. In the dual frame approach, 
estimates for this domain are made from each frame 
or source and then averaged to form a single 
estimate. The weights used to average the estimates 
from the two domains can be specified so as to 
minimize the variance of the overall estimate. 

The other method of estimation, called a network 
approach^ is based upon the determination of the 
overall probability of including a youth in the sample. 
Youth from all domains except domain C, could only 
be sampled from one household. For domain C 
youths, the probability of selection involves the 
possibility of sampling from two households: their 
own telephone household and the telephone 
household of their mother. These two approaches to 
estimation result in identical weights for the Field 
Test. 

11 



The estimation procedure included several stages of 
weighting and adjustments, such as the inverse of the 
probability of selection and nonresponse adjustments. 
Below, only those aspects of estimation which are 
directly related to the multiplicity sample are 
described. 

The discussion of multiplicity sample estimation 
procedures can be facilitated by the introduction of 
adjustment factors. The adjustment factors are 
modifications to the standard weights for youths. For 
completeness, the factors associated with each of the 
domains are given, even though some of the domains 
are not affected by the multiplicity sample. Let 



S = 1 if the youth is in domain A or 

B; 

S = 4 if the youth is in domain F 

(subsampled at a rate of .25); 

S = 0.8 if the youth is not an out-of- 

household youth in domain C; 

S = 0.8 (.2x4) if the youth is an out- 

household youth in domain C. 



and 
of- 



The value of S for youths in domains A and B are 
unity, since these domains are not affected by the 
multiplicity sampling. In domain F, the value of S is 
4 since one-fourth of the sample was included in the 
multiplicity sample experiment. The domain C 
estimate based upon the youths sampled from their 
own household (in-household youths) has an 
adjustment factor of .8. and the estimate based upon 
the out- of- household youths has an adjustment factor 
of .2. The factors of .8 and .2 were used because 
about 80 percent of the domain C youths were 
expected to be sampled from in-household youths.^ 
However, the adjustment factor for the estimate for 
the out-of-household youths must be multiplied by 
four since the households were subsampled at a rate 
of one-in-four. Therefore, the total adjustment for 
out-of-household youths is .8. These adjustment 
factors approximate the optimal factors which are 
proportional to the sample sizes. 

Completion Rates, Sample Sizes and 
Characteristics of Out-of-Household 
Youth 

In this section, the completion rates from the Field 
Test data for the HRI and the Youth Interview are 
examined and the size of the sample arising from the 
use of multiplicity sampling is investigated. These 



estimates arc then used to speculate about the 
resulting sample sizes for surveys with different 
screening sample sizes. 

Household Respondent Interview 
Completion Rates 

One of the goals of the experiment was to examine 
the increase in the sample size for 14- to 21-year-olds 
as a result of the use of the multiplicity sample. This 
can be evaluated by looking at the number of cases 
and the completion rates by household status (in- 
household or out-of-household). 

In the Field Test, the HRI was completed for nearly 
all youths regardless of the household status. The 
completion rate for the in-household youths was 97.1 
percent. For the out-of-household youths, the 
completion rate was 96.5 percent. The proportion of 
the nonrespondents who were refusals (as opposed to 
another type of nonresponse such as not located, or 
language problem, etc.) were also nearly equal, 2.0 
percent for in-household students and 1.5 percent for 
out-of-household students. These completion rates 
and the number of eligible HRI cases that were 
identified in the Scrccner by the household status and 
response status are shown in the detailed tables in 
the Appendix A. 

The results indicate that there is no appreciable 
difference in response rates by household status for 
the HRI. Because of the very high completion rate, 
no further analysis of the HRI completion rate is 
required. The other implication of this result is that 
subsequent analysis of the multiplicity sampling 
(including Youth Interview completion rates and 
sample sizes) can be based upon the completed 
HRPs without significant distortion due to differential 
nonresponse from the HRI. This result has added 
significance because the sampling rates for the Youth 
Interview were determined from HRI data. 



Household Respondent 
Characteristics 



Sample Size and 



The multiplicity sampling experiment resulted in the 
inclusion of 192 youths with completed URVs who 
would not have been included in the sample 
otherwise. Since the multiplicity sample was only 
used in one-fourth of the sample households, we can 
estimate tnat the sample size would have been about 
770 out-of-households youths if multiplicity sample 
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was used in all 5,000 households. This amounts to 
about 6 percent of all the 14- to 21-year-olds 
identified in a survey with 15,000 screened 
households. For samples of other sizes, the 
estimated sample sizes can be found using these 
proportions. For example, ii' 60,000 households were 
screened and multiplicity sampling procedures were 
used in all the households, then a sample of about 
3,00 out-of-household youths would be expected with 
a total sample of 9,600 youths. 

Since the multiplicity sample was implemented with 
the hope of increasing the sample size for certain 
youth (dropouts and those without telephones), the 
characteristics of the out-of-household youths are 
important indicators of the success of the procedure. 
First, the estimates of all youth by household status 
are discussed, and then a few of the characteristics of 
the youth by household status are explored. 

The percent of 14- to 21 -year-olds that are classified 
as out-of-household is estimated to be 7.7 percent.^ 
This is an estimate of the number of 14- to 21-year- 
olds not currently living with their mothers who 



would be identified through the mother's household. 
Note that this differs from the estimated 16 percent 
of the sample that are out-of-household because 
youths in domain C have two chances of being in the 
sample. 

One way of looking at the impact of the multiplicity 
sample for dropouts'^ is to examine the percent of all 
status dropouts and all event dropouts who are out- 
of-household youth. Figure 3 shows estimated 
percents of out-of-household youth with approximate 
95 percent confidence intervals. The relative 
usefulness of the multiplicity sampling for status 
dropouts is evident from this figure, since the percent 
of out-of-houschold youth who are status dropouts is 
larger than the percent of the total. The same is not 
true for event dropouts. 

Another way of seeing the impact of the multiplicity 
sample for different types of youth is to compare the 
percent of youth by household status. Figure 4 shows 
this type of comparison for youth by their age. 
Youths 14 and 15 years old account for a small 
proportion of the out-of-household youth population, 
but for about one-fourth of the in-household youth. 



Figure 3, — Estimated 95 percent confidence interval for 

percent out-of-household youth, by dropout status 
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Figure 4. — Estimated percent of youth by household 
status and age 
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This result indicates that the multiplicity sampling is 
likely to increase the sample size for older youth 
more than for younger ones. The same type of 
analysis (see in :e A-2) reveals that the multiplicity 
sample is also effective for increasing the sample size 
for those note currently enrolled in elementary or 
secondary school. 

These findings suggest that multiplicity sampling for 
14- to 21 -year-olds is reasonably effective in 
increasing the sample size for older youth and those 
who are not currently enrolled in elementary or 
secondary school. The increase in sample size is 
large for status dropouts but not for event dropouts. 
These findings are consistent with the expected 
benefits of multiplicity sampling. 

The other desirable feature of the multiplicity sample 
is ihe ability to include youth who otherwise would 
not be covered in a telephone survey. This issue is 
addressed after the completion rates for the Youth 
Interview are described. 



Youth Interview Completion Rates 

A subsamplc of the 14- to 21-year-olds for whom 
HRFs were completed was selected for the Youth 
Interview. The rates used for the subsampling 
depended upon responses to items in the HRI. All 
of the youth classified as dropouts were included in 
the sample for the Youth Interview. About 20 
percent of the remaining youths were randomly 
sampled for the Youth Interview. The sampling 
algorithm used for in-household and out-of- 
household youth was identical. 

Figure 5 shows the percent of youth not responding 
to the Youth Interview by household status. The 
most striking result is that the percent not completing 
the interview is much lower for the in-household 
youths than for the out-of-household youths. Much 
of the difference in completion rates can be 
accounted for by noting that the vast majority of the 
nonresponses are not refusals, but fall into the "other 
nonresponse" category. This category includes youth 
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Figure 6.— Percent not responding to Youth Interview, 

by household status and reason for nonresponse 
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who could not be reached by telephone and those for 
whom the household respondent did not provide 
locating information. Nearly half of the cases of 
"other nonresponse" are youths who did not live in 
telephone households. In fact, it is somewhat 
remarkable that complete Youth Interviews were 
obtained for 41 percent of the youths who did not 
live in telephone households.^ 

Across the other characteristics of youths, there is 
not very much variability in completion rates fo- the 
in-household youths (see table A-3). The completion 
rate varies from a low of 86 percent to a high of 93 
percent and the refusal rate varies from a low of 3 
percent to a high of 6 percent. The completion rates 
for out-of-household youths also reveal little 
substantial variability across the characteristics of the 
youth. 

The findings indicate that there are significant 
problems associated with locating and obtaining 



completed interviews for out-of-household youths. 
These results should be considered in conjunction 
with the comparison of the dropout reporting in the 
HRI and Youth Interview.' That comparison 
indicated that the classification of youths as status 
dropouts from the HRI corresponded fairly well with 
the classification based on the Youth Interview 
responses. Since more of the out-of-household 
youths are status dropouts rather than event 
dropouts, these results together suggest that 1) the 
multiplicity sample is useful for enlarging the sample 
of status dropouts, and 2) the HRI is sufficient for 
classifying these persons as dropouts. 

The results do cast doubt about the usefulness of 
trying to conduct an extended telephone interview 
with out-of-household youths. If a Youth Interview 
with a high response rate is necessary, then 
significant additional resources (locating resources 
and personal interview resources) may be needed to 
obtain an acceptable completion rate. 
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Estimates of Increased Coverage 

The second objective of the multiplicity sample was 
to increase the coverage of the population of 14- to 
21-year-olds. The coverage is increased because the 
multiplicity sample provides estimates for youths 
living in nontelephone households if their mothers 
live in a telephone household (domain F in figure 2). 
As noted previously, the multiplicity sample does not 
eliminate undercoverage bias completely. Youths 
categorized into domains D and E are still not 
covered under this procedure. 

Figure 6 shows the estimated percent of youths who 
are covered in NHES only because of the use of the 
multiplicity sample (domain F) for all youths and 
dropouts, i.e., the estimated percent of all 14- to 21- 
year-olds who live in nontelephone households but 
have a mother living in a telephone household. The 
estimated percent is an indicator of the reduction in 
the undercoverage bias due to the multiplicity 



sample. In other words, it is the percent gain in 
coverage due to the use of multiplicity sampling. 

Technically, the percent bias is the estimated number 
of persons in domain F divided by the estimated 
aggregate number of persons in all domains. 
Because the NHES Field Test was weighted up to 
the total number of 14- to 21-year-olds in the U.S., 
the cases in domain F and the other domains were 
subjected to differential adjustments. These 
adjustments were introduced to reduce partially the 
impact of the undercoverage bias. Therefore, the 
estimated percent without telephones shown in figure 
6 (and detailed in table A-4) is only an 
approximation of the actual bias reduction from 
multiplicity sampling. 

To provide a better estimate of the bias, estimates 
were computed using the weights prior to the 
introduction of poststratification and bias reduction 
adjustments. These bias estimates are given in 



Figure 6.-^ Estimated percent of youth covered due 
to multiplicity sample 

Ettimated percent of youth in Domain F 

20 j - - 




table A-5. The estimates of the percent bias using 
the unadjusted weights do not differ very much from 
the simpler approximations displayed in the figure 6. 

The examination of the reduction in the 
undercover age bias begins by looking at estimates of 
all youths. An estimated 5 percent of the 27,697,000 
14- to 21-year-olds in the U.S. are out-of-household 
youth without telephones. These youth are only 
covered because of the multiplicity sample. Since 92 
percent of all 14- to 21-year-olds live in telephone 
households,^° the multiplicity sample eliminates 
approximately half of the undercoverage bias for 
estimates of all youths. 

Multiplicity sampling was used in the Field Test 
primarily because dropouts were subject to much 
higher undercoverage rates. Status dropouts have a 
telephone coverage rate of only about 70 percent and 
event dropouts have a telephone coverage rate of 
about 75 percent. The estimated percent bias for 
status dropouts is 15 percent and for event dropouts, 
only 4 percent. Even though the estimated 15 
percent bias for status dropouts is larger than the 15 
percent for all youth, it still only represents half of 
the imdercoverage bias for this group of youth. The 
Auction in the undercoverage bias for event 
cropouts is very small, especially compared to the 
total undercoverage bias for this group. 

The estimated bias for 20- to 2-year-olds is 11 
percent, which is larger than for any other 
characteristic of youth considered, except for status 
dropout. This result suggests that multiplicity 
sampling may be useful for persons in this age group 
for topics other than dropouts. 

The estimates of percent bias indicate that the 
multiplicity sample does reduce the undercoverage 
bias for all 14- to 21-year-olds and for status 
dropouts by approximately one-half. The impact for 
event dropouts is not substantial. The reduction in 
bias is only one of the statistical measures of the 
efficiency of the multiplicity sample. A more 
complete measure which includes the bias reduction 
is discussed in the next section. 



Mean Square Errors of Estimates 

The analysis in the preceding sections described the 
increases in sample size and coverage resulting from 
the use of multiplicity sampling. The mean square 



error of the estimate combines these two aspects into 
one measure of statistical accuracy of the estimates. 

The mean square error of an estimate (MSE) is the 
sum of the variance and the square of the bias of the 
estimate. The variance of the estimate decreases as 
the sample size increases, while the size of the 
undercoverage bias is unaffected by the sample size. 
The undercoverage bias decreases as the result of 
using multiplicity sampling. 

The mean square errors are needed for estimates 
both with and without the multiplicity sample. The 
mean square errors of these two estimates contain 
some identical bias contributions coming from the 
bias associated with domains D and E in figure 2. 
Before discussing this common component, the 
formulation of the mean square errors for the two 
estimates is given. 

The mean square error for an estimate can be 
written as 



where S\. is the variance of z' and B^, is the bias 
of z'. An unbiased estimate of the mean square 
error from a sample can be found by replacing the 
variance and bias squared terms with unbiased 
estimates. The estimated mean square error can be 
written as 

mse(z') = 4 + ^1 - si 

where the first term on the right-hand side is the 
estimate of the variance of the z', fhc second term 
is the square of the estimate of the bias of the z', 
and the third term is the estimate of the variance of 
the bias estimate. If the sum of the last two terms is 
negative (which can happen for small bias estimates 
and relatively small sample sizes), then the MSE is 
estimated by leiting the last two terms be equal to 
zero. 

Let x' be the estimate from the sample excluding 
the multiplicity sample and y' be the estimate of the 
sample including the multiplicity sample. The 
estimate y' contains a component for domain F, but 
neither x' nor y' estimate the component of the 
bias associated with undercoverage in domains D and 



10 




17 



E. These components cannot be estimated from the 
Field Test data. The only estimates of this bias come 
from the analysis of other data sources, such as the 
Current Population Survey (CPS) data discussed 
earlier. 

The MSE for both x' and y' are both affected by the 
omission of th^;. component of the variance associated 
with domains D and E. In comparing these two 
estimators, the component of the bias associated with 
these two domains has been ignored, since it is not 
estimable. 

Now, the estimators of MSE for x' and y' can be 
written. For y', the estimator can be approximated by 



2 ^i^^ 



The other quantities needed to estimate the MSE of x' 
and y' can also be estimated from the Field Test. If 
we assume that 60,000 households are screened, then 
about 19,600 youths are expected (Dy) if multiplicity 
sampling is used in all households. If it is not used, 
then the expected sample size is about 17,000 (n,). 
The sample size for domain F (n^) is approximately 
220 (7 percent of the estimated 3,00 out-of-household 
youths expected) in this scenario. These estimated 
sample sizes were derived in Section 3.2.'' 

The size of the bias depends upon the characteristic 
being estimated. Estimates of the percent bias for 
various characteristics are shown in table A-5. These 
can be converted to totals by multiplying by the 
appropriate totals given in table A-4. For example, 
the estimated bias for status dropouts is 346,000 youths 
(.149 times 2,323,000 status dropouts). 



where D is the design effect, s^ is the unit variance of 
the estimate and ny is the sample size for this estimate. 
Note that there are no bias terms in this estimator. 

The estimator for x' is approximated by 



2 ^2 Jl ^2^^ 



2 



where the terms are defined as before and the last term 
on the right-hand side of the equation is the estimate of 
the variance of the bias estimate. The sample size for 
the estimated bias, n^, is the number of cases in 
domain F. Note that in this formulation the population 
variances are assumed to be equal. 

The formulas for the estimated MSE for x' and y' 
have provisions for different design effects, D and Dj. 
A design effect is used to approximate the ratio of the 
variance of the estimate from a complex sample to the 
variance that would have been expected from a simple 
random sample of the same size. The design effect for 
y' (D) should be greater than because the youths in 
domain C are effectively oversampled by a factor of 
two in the multiplicity sample. From the Field Test 
we estimate that the approximate value of D is 1.6 and 
the approximate value of Dj is .5. 



To continue this example, the MSE for x' and y' will 
be estimated for the percent of all youths who are 
status dropouts. An estimated 8 percent 
(2,323,000/27,697,000) of all youth are status 
dropouts. The estimated bias is .012 
(346,000/27,697,000) for this statistic. The estimated 
MSE*s are given by 



mse(x") = 



1.6(0.08)(l-O.O8) 
20,000 

1.5(0.08)(l-0.08) 
17,000 

^ (.012)(1-.012) 
220 



- 6.1 X 10-^ 



(.012)2 



106.7 X 10'^ 



For this statistic, the estimated bias term dominates the 
MSE of x'. As a result, the MSE for y' is much 
smaller than that of x'. The ratio of the estimated 
MSE for x' to y' is 17 (107/6), indicating the 
superiority of y' for this statistic. The ratios for other 
estimates of percents are shown in table 1 (others are 
shown in table A-5). 

The ratios of the estimated MSE*s are large when 
either the estimated percent bias is larger than average, 
or when the estimate is a large percent of the total. 
The first condition arises because of the dominance of 
the bias term as shown in the example for status 
dropouts. The second condition arises because the 
variance of a proportion (?) approaches zero as P 
approaches either zero or one (the variance is 
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P(l-P)/n). When the variance approaches zero the 
bias again becomes the dominant * term in the 
estimate of the MSE. 

Table 1. Estimated ratio of mean square error without 
multiplicity sampling to that with multiplicity 
sampling 

Ratio of 

Characteristic of youth MSE(x*) to MSE(y') 



A.ge 

14 to 15 years 1.1 

16 to 17 years 1.0 

18 to 19 years 10.9 

20 to 21 years 44.0 

Race/ethnicity 

Hispanic 2.9 

Black, Non-Hispanic 7.8 
Non-Black, Non-Hispanic 69.1 

Status dropouts 

Yes 17.3 

No 210.9 

Event Dropouts 

Yes 1-6 

No 1,324.6 



Source: 1989 National Household Education Survey Field 
Test 

Tiw lauos depend upon the relative contribution of 
the variance and the bias estimates. As noted in the 
status dropout example, the bias term is dominant 
since it is about 15 times the size of the variance 
term. For the estimate of persons who are not status 
dropouts, this relationship is even more exaggerated 
because the bias term is larger than for status 
dropouts (the variance term is the same). The bias 
term is larger because it is the product of the 
estimated percent bias and the percent of the 
population in the subgroup. Status dropouts 
constitute only 8 percent of the population and non- 
dropouts are 92 percent. As a result, the ratio for 
non-dropouts, and other characteristics shared by 
large proportions of the population, tend to be very 
large. 

The ratios of the MSE's show that the multiplicity 
sample has a significant positive impact on estimates 
of older youths, but very little impact on estimates of 



younger youths. The multiplicity sample is effective 
in improving the accuracy of the estimates of status 
dropouts, but for event dropouts it has very little 
impact. These results are consistent with those 
reported earlier and with the expected benefits of 
multiplicity sampling m this population. 

Summary and Recommendations 

The analysis of the Field Test data indicates that the 
multiplicity sample is effective in increasmg the 
sample size of 14- to 21-year-olds included in the 
survey. The multiplicity sample results in more older 
youths and status dropouts, but does little to add to 
the sample size of the younger youths and the event 
dropouts. 

The completion rates for the HRI and Youth 
Interview reveal that mothers are willing to provide 
the information for the youths who no longer reside 
in their households, but it is difficult to contact these 
youths for extended interviews. The primary 
difficulty in obtaining extended interviews is that 
many of the youths do not have telephones in their 
homes. 

The data from the Field Test also show that the 
multiplicity sample is effective in reducing the 
undercoverage bias for some statistics but not for 
others. The procedure is most effective for status 
dropouts, older youths, and youths not currently 
enrolled in elementary or secondary school. In these 
cases the bias is approximately halved. On the other 
hand, for younger youths and event dropouts the 
procedure does not significantly reduce the 
undercoverage bias. The estimated mean square 
errors confirm these findings. 

Since the cost of screening households to find those 
with 14- to 21-year-old members (less than one in 
four households) is large relative to the cost of 
conducting the HRI, it is recommended that the 
multiplicity sampling approach be implemented in 
any future survey on dropouts. The cost and 
effectiveness of obtaining extended interviews with 
out-of-household youths suggest that those resources 
might be better allocated to other problem areas if 
the goal is to estimate the percent who are dropouts. 
This recommendation is consistent with the finding 
that for status dropouts and older youths (the groups 
captured most frequently in the multiplicity sample), 
the HRI is a reliable data source. On the other 



hand, if other characteristics of status dropouts which 
cannot be obtained from the household respondent 
are important, then extended interviews with the 
youths themselves are necessary. 

The estimates of the mean square errors also reveal 
the importance of the undercoverage bias for 
estimates of dropouts. In a full-scale implementation 
of the NHES for dropouts, special procedures should 
be implemented to reduce the impact of this bias. 
One of the procedures available is multiplicity 
sampling, as discussed in this report. Another 
method is to implement estimation procedures, such 



as those used in the Field Test, to reduce the bias. 
A third, more satisfying but also more costly, 
procedure is to implement a dual frame survey to 
cover non-telephone households. The dominance of 
the bias terms in the estimates suggests that it would 
be advisable to reduce the telephone sample size 
somewhat if those resources could be used to 
eliminate undercoverage bias. These 
recommendations are specific for the sampling and 
estimation of the number of dropouts and the 
characteristics of dropouts. For other populations 
and statistics, the optimal procedures may be very 
( '.fferent from those suggested for dropouts. 
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5. About 80 percent of the sample of youths from domain C should have been derived from the in- 
household sample because of the one-in-four sampling of households for the multiplicity sample. This 
is how the original factors of .8 for the in-household sample and .2 for the out-of-household sample were 
derived. 

6. Another approach is to estimate the number of additional households and youths per household that 
are included as a result of the multiplicity sample. Estimates from the Field Test show that an 
additional 5 percent of households with about 1.1 youths per household are included from multiplicity 
sampling. The multiplicity sample also raises the average number of youths per household very slightly 
for households with in-household youths. The net result is consistent with the 8 percent increases noted 
above. 

7. A status dropout is defined as a 14- to 21-ycar-old who was not enrolled in October of the current year 
and did not have a high school diploma or equivalent. Event dropouts are defined as the subset of 
status dropouts who were enrolled in school in October of the previous year. In other words, a status 
dropout is someone who is not currently enrolled and does not have a diploma or equivalent, and an 
event dropout is a dropout who left school within the last year. 

8. Interviews with youths who do not live in telephone households were accomplished by obtaining work 
telephone numbers or telephone numbers of friends whom they frequently visit. In addition, telephone 
calls to the mothers' households were attempted during the Thanksgiving weekend to interview the 
youths at that location. 

9. See Proxy Reporting of Dropout Status in the NHES Field Test for details on the rcpoiting of dropout 
status by self/proxy in the NHES Field Test. 

10. See Telephone Undercoverage Bias of 14- to ll-Year-Olds and J- to 5-Year-Olds for the estimates of the 
telephone undercoverage rates based upon CPS data. 

11. These comparisons are based on designs which have equal numbers of screened households. A different 
approach to evaluating the effectiveness of the designs is to fix the total cost of data collection and 
compare the mean square errors of the estimates. Under a fixed cost scenario, the number of screened 
households would be significantly decreased and the multiplicity sampling approach would be less 
favorable than that given in the fixed screened households approach. 



ERLC 



15 

21 



APPENDIX 



Detailed Tables 




17 



Table A- L— Number of household respondent interviews, by household status and response status 



Household status 


Total 


Response status 


Completes 


Nonresponse 


Completion 


Refusal rate 


Total 


Refusals 


Other* 


rate 


Total 


4,441 


4,313 


128 


67 


61 


97.1% 


1.5% 


In-household 


4,242 


4,121 


121 


63 


58 


97.1 


1.5 


Out-of-household 


199 


192 


7 


4 


3 


%.5 


2.0 



*Includes not-located youths, language problems, and miscellaneous categories of nonresponse. 
Source: 1989 National Household Education Survey Field Test 
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Table A-2.— Estimated number of 14- to 21-year-olds by household status and selected characteristics 



Selected characteristics 


Total 
(thousands) 


Household status 


In-household 


Out-of-household 


Number 
(thousands) 


Percent 


Number 
(thousands) 


Percent 




27,697 




inn (\(^ 




inn n^ 


Age 












14 to 15 years 


6,571 


Alt 

o,4/l 






4. / 


16 to 17 years 


6,767 


COT 
0,J0/ 




1 /y 




1$2 tn 10 t/r^arc 
lo lU ky yCafa 




6,732 


26.3 


653 


30.4 


20 to 21 years 


6,974 


5,761 


22.5 


1,213 


56.5 


Race/ethnicity* 












1~IiCTmnfP 

M JLloUcltllC 


2,784 




in 1 




Q 1 


Black, Non-Hispanic 


4!o60 


3,709 


14.5 


351 


16.4 


Non-Black, Non-Hispanic 


20,736 


19,147 


74.9 


1,589 


74.1 


Gender 












I^ale 


13,897 


12,920 


50.6 


977 


45.5 


Female 


13300 


12,632 


49.4 


1,168 


54.5 


Elementary/ secondary 












enroumeni 












Currently enrolled 


13,477 


13.204 


51.7 


273 


12.7 


Currently not enrolled 


14,220 


12,348 


48.3 


1,872 


87.3 


CtnfiiG Ht*rknAiifG 












Yes 


2,323 


1,910 


7.5 


413 


19.3 


No 


25,374 


23,642 


92.5 


1,732 


80.7 


Event Dropouts 












Yes 


587 


556 


2.2 


31 


1.5 


No 


27,110 


24,996 


97.8 


2,114 


98.5 



*Does not add to total because of missing values for race or ethnicity. 
Source: 1989 National Household Education Survey Field Test 
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Table A-3.— Number of Youth Interviews by household status, response status and selected characteristics 





Household status 




In-household 


Out-of* household 


Selected characteristics 


Nil in^ 


fV^ninlotoc 


Non response 


Number 




Non response 


Total 


Refusals 


Other* 


Completes 


Total 


Refusals 


Other* 


Total 


1,721 


89.3% 


10.7% 


4.4% 


6.3% 


131 


51.1% 


48.9% 


6.9% 


42.0% 


Affc 




















0.0 


14 to 15 years 


217 


92.6 


7.4 


2.8 


4.6 


1 


100.0 


0.0 


0.0 


16 tn 17 vcars 


302 


89.4 


10.6 


4.6 


6.0 


5 


40.0 


60.0 


20.0 


40.0 


18 to 19 yeare 


576 


88.0 


12.0 


4.5 


7.5 


41 


51.2 


AQ Q 
HO.O 


LA 


Af^ 1 




626 


89.3 


10.7 


4.8 


5.9 


84 


51.2 


48.8 


8.3 


40i 


Race /ethnicity* * 




















54.5 


J lispanic 


194 


86.6 


13.4 


5.7 


7.7 


11 


27.3 


11.1 


18.2 


Black, Non-Hispanic 


240 


90.0 


lU.U 




U.J 


12 


33.3 


66.7 


0.0 


66.7 


Non-black, Non- 
























1.278 


89.6 


10.4 


4.3 


6.1 


105 


55.2 


44.8 


6.7 


38.1 


G«nd«r 






















Male 


846 


88.1 


11.9 


4.5 


7.4 


62 


58.1 


41.9 


3.2 


38.7 


I^emale 


one 




9.5 


4.3 


5.1 




44 0 


55. i 


lO.l 


44.9 


Elenientao'/sccondao' 






















enrollment 






















Currently enrolled 


463 


91.4 


8.6 


3.7 


5.0 


4 


100.0 


0.0 


0.0 


0.0 


Currently not enrolled 


1,258 


88.6 


11.4 


4.7 


6.8 


127 


49.6 


50.4 


7.1 


4.3 


Status dropouts 






















Yes 


lis 


87.3 


12.7 


6.2 


6.5 


31 


38.7 


61.3 


9.7 


51.6 


No 


1,446 


89.7 


10.3 


4.1 


6.2 


100 


55.0 


45.0 


6.0 


39.0 


Event Dropouts 






















Yes 


79 


86.1 


13.9 


3.8 


10.1 


2 


0.0 


100.0 


0.0 


100.0 


No 


1.642 


893 


10.5 


4.4 


6.1 


129 


51.9 


48.1 


7.0 


41.1 



*Primarily youths with language problems, those who did not live in telephone households, and those for whom no locating information 
was provided. 



••Docs not add to total because of missing values for race or ethnicity. 
Source: 1989 National Household Bducatlon Survey Field Test 
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Tabic A-4.— Estimated number of 14- to 21-year-olds by household status, telephone presence, and selected characteristics 



Selected characteristics 


Total 
(thousands) 


Ttinhniici^hnlH 

ill-lliJUOdlUlU 

total 
(thousands) 


Out-of-household 


Percent of 
14- to 21 -year- 
olds without 
phones 


With phones 
(thousands) 


Without phones 
(thousands) 


Total 


27,697 


25^52 


786 


1,359 


4.9% 


Age 












i*r lo iD ycaia 




/. All 

0,471 


16 


84 


1.3 


16 lo 17 years 


bjbl 


y COT 

6,587 


81 


98 


1.5 


18 to 19 years 


7,385 


6,732 


238 


414 


S 6 


20 to 21 years 


6,974 


5,761 


450 


762 


10.9 


Race/ethnicity* 












Hispanic 


2,784 


2,588 


56 


139 


5.0 


Black, non-Hispanic 


4,060 


3,709 


68 


283 


7.0 


Nonblack, non-Hispanic 


20,736 


19,147 


652 


937 


4.5 


G«nder 












Male 


13,897 


12,920 


338 


639 


4.6 


Female 


13,800 


12,632 


448 


720 


5.2 


Elementary/secondary 












enrollment 












Currently enrolled 


13,477 


13,204 


102 


171 


1.3 


r^xrrnnt?!/ nnf /*nrrkll/*r1 

v^mIIvIP ly nui liiiuiilu 






684 


1,188 


8.4 


Status dropouts 












Yes 


2,323 


1,910 


114 


299 


12.9 


No 


25,374 


23,642 


672 


1,060 


4.2 


Event dropouts 












Yes 


587 


556 


9 


22 


3.8 


No 


27,110 


24,996 


777 


1,337 


4.9 



•Data docs not add to total because of missing values for race or ethnicity. 



Source: 1989 National Household Education Survey Field Test 
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Table A-5.-Estimated bias, mean square errors, and relative errors for estimators, by selected 



Selected characteristics 


Estimated 
percent 
bias 


Estimated 
MSECx')* 

without 
multiplicity 

sample 
(millionths) 


Estimated 
MSE(y)* 

with 
multiplicity 

sample 
(millionths) 


MSECx")/ 
MSEC/) 


Age 

14 to 15 years 
16 to 17 years 
18 to 19 years 
20 to 21 years 


0.9% 
1.6 
5.5 
11.0 


15.9 
14.1 
170.0 
663.3 


14.5 
14.8 
15.6 
15.1 


1.1 
1.0 

in Q 
i\J.y 

44.0 


Race/ethnicity 

Hispanic 

Black, Non-Hispanic 
Non-Black, Non-Hispanic 


6,5 
7.3 
4.6 


20.8 
77.6 
1,040.7 


7.2 
10.0 
15.1 


2.9 
7.8 
69.1 


Gender 
Male 
Female 


4.9 

5A 


524.0 
551.0 


ZU.U 

20.0 


zo.z 
27.6 


Elementary/ secondary 
enrollment 

Currently enrolled 
Currently not enrolled 


1.2 
8.4 


30.6 
1,688.7 


20.0 
20.0 


1.5 
84.5 


Status dropouts 

Yes 
No 


149 
4.2 


106.7 
1,296.6 


6.1 
6.1 


17.3 
210.9 


Event dropouts 

Yes 
No 


4.4 
5.0 


2.7 
2,197.0 


1.7 
1.7 


1.6 

1,324.6 1 



*The estimated MSE's exclude the bias associated with youths having mothers not living in telephone 



Source: 1989 National Household Education Survey Field Test 
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